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Gumbel Statistics 
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ABSTRACT 

We describe and discuss the application of Gumbel statistics, which model extreme 
events, to WMAP 5-year measurements of the cosmic microwave background. We 
find that temperature extrema of the CMB are well modelled by the Gumbel formal- 
ism and describe tests for Gaussianity that the approach can provide. Comparison 
to simulations reveals Gumbel statistics to have only weak discriminatory power for 
the conventional statistic: /nl < 1000, though it may probe other regimes of non- 
Gaussianity. Tests based on hemispheric cuts reveal interesting alignment with other 
reported CMB anomalies. The approach has the advantage of model independence 
and may find further utility with smaller scale data. 

Key words: Cosmic Microwave Background 



1 INTRODUCTION 



Although the primary source of information in the cosmic 
microwave background (CMB) is the angular power spec- 
trum, tests of other statistics provide an important con- 
sistency check for cosmological models. The key assump- 
tions of Gaussianity and isotropy of the CM B, particu- 
larly of the WMAP data (|Limon et. al.ll2008l ). have both 
been challenged by more complex tests. The isotropy of the 
CMB has be en strongly challeng ed with respect to power 
distribution llHansen et. al.l 2004) and mul tipole alignment 
l|Land fc Magueiid l2007l : ICopi et. al] |2004| ). and some ev- 
idence for non- Gaussianty has been no ted among man; 
statistical tests llVielva et~al. 2004; Erikesen et. al.l |200 
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iMukheriee et. al.H2004l : iMcEwen et. all bOOsI ). Many such 
tests use higher order statistics, moving beyond the cor- 
relation of pairs of sky pixels and using more complex 
comparisons. The statistics of CMB extrema have previ- 
ously been studied by looking at the statistics of hot and 
cold spots ([Larson fc Wandeltll2005l : lHou. Bandav fc GorskH 
I2OO9I ): here we address the inverse question: given patches 
of fixed area, what are the statistics of the extremes within 
those patches? We develop and demonstrate the use of a 
new statistical approach, based on the statistics of extreme 
events. We find that the test has weak discriminatory power 
but shows some suggestion of non- Gaussianity at low £. 

Gumbel statistics describe the behaviour of sample ex- 
trema in the same way that Gaussian statistics are used 
to describe the behaviour of sample means. For example, if 
we draw n independent values from a Gaussian distribution 
N{fj,,a), the average of those values will also be Gaussian- 
distributed with mean fi and standard deviation a/^/n. Sim- 



ilarly, the maximum and minimum of t hat sample w ill, in 
the limit, follow a Gumbel distribution l|Castill 

Just as the central limit theorem implies that sample 
means from any distribution will, in the limit, tend to a 
Gaussian profile, all sample extrema will tend to a Gumbel 
distribution for sufliciently large sample size. The distribu- 
tion is also k nown in htera ture as the von Mises family of 
distributions (|Castillol 120051 ). 

Gumbel statistics have found application in a number of 
fields, primarily financial and meteorological. In this paper 
we apply Gumbel statistics to CMB-type maps and hence 
discuss cosmological applications of Gumbel distributions. 
Section [2] presents the mathematical basis of Gumbel statis- 
tics. In Section [3] we apply the formalism to the CMB and 
simulated maps to demonstrate our method. Section |4] com- 
pares the findings for the CMB to those for simulated Gaus- 
sian maps. Sections [5] applies Gumbel statistics to searching 
for potential non-Gaussianities in simulated maps. 



2 GUMBEL DISTRIBUTIONS 

The most general Gumbel profile for sample maxim a is given 
by the cumulative distribu tion function of the form l|Castilld 
l2005l : lBeirant et. al.l[200i l: 



C7^(2/) =ea;p(-(l + 7y)-i/^) 



(1) 



where y is related to the variable value x via scale and 
intercept parameters: y = (k — a)/b. In the context of the 
CMB X is the maximum temperature of a pixel in a patch. 
The quantity 7 is a shape parameter that can take any 
real value, depending on the underlying distribution. Fig- 
ure [T] plots the probability density functions, corresponding 
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Figure 1. Selection of Gumbel probability density functions for 
sample maxima. Plotted values of 7 are: —0.8 (dot), —0.4 (dash), 
(solid), 0.8 (dash-dot). 

to Equation!!] using a selection of 7 values. A similar family 
of distributions exists for the sample minimum, and we shall 
refer to both collectively as 'Gumbel statistics'. 

The value of 7 for many analytic underlying dis- 
tributions can be calculated rigorously (|Castillol l2005l : 
iBeirant et."aLl |2004 ). For example, a sample of identically 
distributed Gaussian variables can be shown to yield 7 equal 
to zero for both maxima and minima. In general, though, 
the 7 one calculates will be different for the maxima and 
minima limit distributions. The distribution of the CMB, 
on the other hand, is a more complicated multi-dimensional 
Gaussian, and there is no analytic prediction for its Gumbel 
statistics. 

We can, though, estimate the 7 parameter by repeat- 
edly taking a "sample of samples" from a distribution - tak- 
ing n sets of values from the distribution, each containing 
m measurements, and recording the maximum in each set. 
When the number n of such maxima is large enough, the 
corresponding density function can be fitted to a Gumbel 
form. The 7 for the underlying distribution will be reflected 
accurately in the data, provided that m is large enough for 
an analogy of the central limit theorem to hold, and we will 
be able to estimate its value accurately if n is large enough. 



3 GUMBEL STATISTICS OF CMB MAPS 

To study the extreme statistics of CMB- type maps, we make 
use of randomly placed circular regions ('patches'). We po- 
sition a number of such patches on the CMB sky and record 
the maximum and minimum temperatures within them (Fig- 
ure [2|. For Gumbel statistics to be applicable to this set 
of extremes, the patches must be large enough to contain 
sufficient independent temperatures for the analogy to the 
central limit theorem to hold. This is a function of the pixel 
and patch size, as well as the map resolution. We take as our 
CMB data the five-year I nternal Linear Combination (ILC) 
map iLimon et. all 120081 ). An upper limit to the Fourier 
resolution (maximum I value) credibly usable in the ILC 
map is set by foreground emission, so that at ^max > 25 
the true CMB signal becomes progressively contaminated 




Figure 2. Random placement of 600 patches of 4° radius over a 
CMB map with the Processing mask applied. 

l|Limon et. al.|[200^ . Another constraint is having only a fi- 
nite CMB sky to sample, as a result of which our sets of 
samples will tend to overlap if patch sizes are too large. 
The ideal patch size would be large enough to allow suffi- 
cient variability, yet small enough to prevent overlapping. In 
practice, patches of angular radius between 2° and 8'' fulfill 
both these conditions. 

The ILC map is certainly not ideal: we are forced to 
use patches on scales small enough that foregrounds can 
contaminate them. We avoid this by removing high spatial 
frequency modes from the map, but this will reduce the de- 
tectability of any real signals that are present on smaller and 
intermediate scales. A full analysis could employ data from 
the high-signal frequency bands and a much more conserva- 
tive mask, but the ILC map serves to illustrate the process 
and perform basic tests. 



3.1 Methodology 

We test whether Gumbel statistics are applicable to CMB- 
type data by computing 7 for the CMB and simulated Gaus- 
sian maps. Since we will later examine the use of Gumbel 
statistics in detecting non-Gaussianity (Section [5]), we also 
test simulated maps with a varied non-Gauss ian signal gen- 
erated using the method of Liguori et. al. (|Liguori et. al.l 
1200a) . 

Our algorithm is as follows. To prevent the worst con- 
tamination from galactic foreground emission, we first mask 
the da ta using the p r ocess ing mask supplied by the WMAP 
team l|Limon et. al.l 120081 ). which covers 6% of the sky. 
We then randomly place patches of fixed angular radius 
over the remaining map (Figure [2]) and record the maxi- 
mum/minimum temperature values in each region enclosed. 
The sets of maxima and minima are then converted into 
CDF proflles to which a Gumbel distribution (e.g. Equation 
IT]) for the maxima) can be fitted. This way, one obtains 
the Gumbel parameters a, b and 7 for the map in question. 
Parameter errors are obtained, using a simple Monte Carlo 
process over the whole algorithm. Error bars drawn in sub- 
sequent plots will correspond to Icr in total width. 

Our simulated maps are realisations of a concordance 
ACDM c osmology with WMAP standard cosmological pa- 
rameters (|Komatsu et. al.ll2009l ). Our non-Gaussian maps 
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Figure 3. Sample Gumbel fits to the CMB data. Fitted Gumbel 
PDF's are shown superimposed on the temperature histograms, 
normalised to unity. The data correspond to: maxima fit using 
600 4° patches at ^max = 96 (dot), maxima fit using 1000 2° 
patches at £max = 16 (solid), minima fit using 600 4° patches at 
^max = 96 (dash). 

are generated using the same power spectrum but have a 
non- Gaussian component with a tunable value of /nl . 

Analysis on simulated maps is done in the same man- 
ner as for the ILC, including the masking of the sky. The 
only difference is introduced in the Monte Carlo algorithm, 
where we generate a new simulated map each time the ran- 
dom placement of patches is rerun. By doing so we obtain 
estimates of Gumbel parameters for all maps with a partic- 
ular power spectrum. 

3.2 Results 

3.2.1 Fit Quality 

For both real and simulated data, the collected sets of max- 
ima and minima are very well fit by the Gumbel form with 
parameters a, b, 7. Figure |3] shows three sample fits with the 
ILC data for both maxima and minima. Evidently, in all 
cases the fits are close to ideal. This was also the case for 
simulated maps, both Gaussian and non-Gaussian (exam- 
ined up to /nl — 4000). The Gumbel 7 parameter seems 
therefore to be a well-defined statistic characterizing a CMB- 
type map at a given resolution. 

3.2.2 Stability of the Gumbel^ parameter 

We next investigate the variation in 7 with sample size. We 
expect 7 to be independent of the sample (patch) size, once 
a critical value is reached, since one would then keep ap- 
proaching the limiting distribution with increasing accuracy. 

We test the stability of 7 by repeating the fitting proce- 
dure for varying patch size, using in each case the maximum 
number of patches without oversampling. Results are shown 
in figure|3]for patches varied between 2° to 6° in radius, with 
total number of patches ranging between 1000 and 100, re- 
spectively. Only results for maxima are displayed, since the 
minima fits lead to similar conclusions. Since they are free of 
foregrounds, we examine the simulated data over a greater 
range of fmax . 



Figure 4. Gumbel 7 versus patch radius plotted for the maxima 
of the CMB map (solid) and simulated Gaussian maps (dot). 
The various values of ^max correspond to: ^max = 16 (star), 
fmax = 32 (square), fmax = 96 (triangle), £niax = 128 (diamond), 
^max = 500 (none) 

As can be seen from Figure |4l the results are consistent 
with 7 remaining constant over the patch size for both the 
ILC and simulated Gaussian maps, smoothed to a variety of 
^max ■ This result was also seen to hold for simulated non- 
Gaussian maps. We conclude that the Gumbel treatment is 
valid for a range of simulated maps, with non-Gaussianity 
spanning between /nl = and /nl — 4000, and for the ILC 
map. 

3.2.3 Variation in the values of "f 

Over the range of maps studied, for both maxima and min- 
ima, the Gumbel treatment invariably yields negative values 
of 7, co rresponding to the W eibull domain in extreme value 
theory l|Beirant et. al■ll2004^ . For this domain, the PDFs of 
temperature maxima are limited on the right (Figure [l|, 
whereas the ones for temperature minima are limited on the 
left; i.e. in both cases the tail of interest has an upper bound. 

It is also observed that the 7 values for maxima tend 
to be generally lower and vary more with £max than for the 
minima. In the case of the ILC, Gumbel statistics differen- 
tiate positive and negative temperature fiuctuations within 
the data by a clear spread (Figure [SJ. 

This result is not true for simulated Gaussian maps, 
where the values for maxima closely mirror those for the 
minima. This is expected, given a map generated to be sym- 
metric about the zero temperature. 



4 COMPARISON WITH GAUSSIAN 
SIMULATIONS 

A pertinent question to investigate is whether or not Gum- 
bel statistics of the CMB differ from statistics collected using 
simulated Gaussian maps. Any notable difference would be 
evidence against the 5-year ILC map being a purely Gaus- 
sian data set. 

To study possible differences, we collect values of 7 from 
the ILC and from simulated Gaussian maps using a fixed 
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Figure 5. Gumbel 7 versus i'max , plotted for the CMB and 
simulated Gaussian maps. The data correspond to: CMB full sky 
(solid), CMB North hemisphere (dash), CMB South hemisphere 
(dot). Both maxima fits (star) and minima fits (triangle) are 
given. The grey band corresponds to the full-sky results (max- 
ima, minima) for simulated Gaussian maps, averaged over 100 
realisations. 



patch radius {9 — 4°) and examine their variation with 
^max ■ We present data for four different Fourier resolutions: 
^max ~ 16, 32, 64 and 96. Increasing ^max further would 
imply using the data at much higher accuracy than recom- 
mended by the WMAP team and is also found to add no 
significant variation. Using lower £max , on the other hand, 
would reduce the number of independent temperatures in- 
side a patch, invalidating the Gumbel approach. 

Figure [S] (solid line) shows results for the CMB, which 
can be compared with the Gaussian data (grey band), col- 
lected and averaged over 100 simulated maps. The band is 
then an indication of where data for a single Gaussian re- 
alisation would be expected to lie. Separate results for the 
CMB North (dash) and South (dot) hemispheres are also 
plotted. 

As seen, data for the CMB display slight variation with 
^max , though not of high statistical significance. More im- 
portantly, the CMB data show a spread of several a between 
the maxima and minima which is not present in the simu- 
lated Gaussian results. 

To quantify the likelihood of this separation between 
maxima and minima, we generate 1000 Gaussian maps and 
record values of "fmin — ymax, collected at ^max ~ 32. The re- 
sulting histogram is shown in Figure |S] (left), with the arrow 
indicating the corresponding spread in the CMB. Evidently, 
on this criterion the ILC map is still consistent with being 
a Gaussian set. 

On further investigation, it is found that the spread 
in 7 within the CMB varies considerably with direction. 
Rerunning the same analysis with the ILC map split into 
North and South hemispheres we find that values of 7 change 
their magnitude by several a, and the spread between max- 
ima and minima is actually reversed (Figure O dot, dash). 
The result for the whole sky (solid) masks considerable 
anisotropy within the data set. Variations of this kind are 
not found in Gaussian maps, when averaged over, though 
individual realisations with similar North-South asymmetry 
were encountered. 




-0.4-0^20.0 0.2 0.4 



-0.4-0.2 0.0 0.2 



0.4 



Figure 6. Left: histogram showing the spread of -fmin —'Jmax for 
1000 simulated full-sky Gaussian maps, examined at £max = 32. 
The arrow indicates the CMB value. Right: histogram showing the 
range of "fmin — "/max for 1000 simulated Gaussian hemispheres 
(^max = 32). The arrows indicate the range of CMB values, as 
seen in Figure [7] 



To survey this anisotropy within the CMB, we rotate 
the North-South axis to varying orientations and repeat the 
analysis for the corresponding 'North' hemisphere only. Re- 
sults are shown in Figure [T] Here, the spread in 7 for each 
rotated hemisphere {"fmin — 'Ymax measured at £ = 32) is 
shown as the value for the pixel, located along the direction 
of the rotated 'North' axis. 

The value of "fmin — ^max in a hemisphere is thus seen to 
vary between —0.11 and 0.31, depending on orientation. To 
compare with Gaussian maps, we record the same statistic 
for 1000 simulated hemispheres, as shown in Figure |6]( right). 
We thus find that the variation in "fmin — ^max within the 
CMB is of magnitude comparable to simulated Gaussian 
maps, though the strongest signal would only be reproduced 
in < 10% of the simulated maps. 

Figure [7] also plots well-kno wn CMB anomalies that 
have previously been reported (iLand fc Magueiicl l2005l : 

21 ' 



lErikesen et.~aLl l2004l : IVielva et. alT I2OO4I ). The alignment 
which maximizes the G umbel discriminant is s een to be, 
like the Axis of Evil (iLand fc Magueiid l2005h . close to 
the equator of another previously reported anomaly - the 
anisotro pic division of powe r between the ecliptic hemi- 
spheres (|Erikesen et. al.|[20o3 ). 

In summary, having examined the Gumbel statistics of 
the CMB, we can report considerable variety in the values 
of 7 within the data set. The data are consistent with Gaus- 
sianity, though the observed signal is only reproduced in 
< 10% of the simulated Gaussian maps. 



5 NON-GAUSSIANITY LIMITS 

Having established the validity of the Gumbel approach 
as applied to the CMB and simulated data (Section O, 
we probe further its usefulness in detecting non-Gaussian 
CMB signals, generated by quadratic perturbations in the 
primordial potential field, /n l . Fo r a review of previou s 
work, see lYadav fc Wandeltl (|2008l ): IVielva et. all l|2004h : 
iMukheriee et. ah I (|2004l ). 
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Figure 7. A map recording 7„i„ - -fmax for the CMB (£max = 
32), as measured using hemispheres of different angular orien- 
tation. The pixel position corresponds to the orientation of the 
rotated North axis. Angular positions of the Ecliptic and Galac- 
tic poles are shown superimposed (NEP, SEP, NGP, SOP) as is 
the loc ation of the Virgo clus ter and the orientation of the CMB 
dipole llKomatsu et. al.ll2009h . Three known CMB anomalies are 
also indicated: the cut of maximum hemispher ic power asymme- 
try lErikesen et. aTll2004h, the Axis of Evil l|Land fc Magueiid 
120051) . and the Cold Spot llVielva et. al.ll2004l) 



three possible explanations for this. One possibility is resid- 
ual impact from foregrounds, even at these large scales. Sec- 
ondly, the non-Gaussian signature might be of a form not 
well described by /nl ■ The third possibility is a simple 
statistical fluctuation. 



6 CONCLUSION 

We have introduced the statistics of sample extremes, Gum- 
bel statistics, to the study of CMB maps. We have shown 
the statistics to provide a good description of the WMAP 
data set despite the highly correlated fluctuations therein. 

We have investigated the use of Gumbel statistics in 
non-Gaussianity detection, the principal advantage of our 
method being its lack of restrictions as to the underlying 
statistics of the CMB. The simplest methods, based on 
Gumbel statistics, are unlikely to detect any /nl < 1000. 
There are weak hints of non-Gaussianity at low I, as we 
can only reproduce the observed statistics some 10% of the 
time in Gaussian realisations, but no hard evidence. We also 
find that there is a preferred hemispheric cut that optimises 
the Gumbel discriminant. We emphasize that /nl is a lim- 
ited and non-generic description of non-Gaussianity. Gum- 
bel statistics can provide a potential probe of generic non- 
Gaussianity in CMB data sets. 
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Figure 8. Gumbel 7 as obtained for maxima of simulated non- 
Gaussian maps (^max = 500) of varying /nl ■ Results are plotted 
for patch sizes 2° (star) and 4° (diamond). 

To begin, we examine the variation of 7 with /nl using 
high-resolution simulated maps with ^max = 500. We vary 
/nl over the range of to 4500, a region over which 7 has 
been tested to be a reliable parameter. Figure |8] plots the 
results. 

We observe a monotonic increase in 7 as /nl is varied 
from 1000 to 3000 after which point the value seems to sta- 
bilise. Similar though less steep monotonic increase is found 
when plotting values of 7 for Gumbel minima. 

The size of error bars on the 7 parameter in Figure 
[H] constrain the lowest signal likely to be detected by this 
method to approximately /nl =~ 2000. The procedure 
seems to afford no way of obtaining more precise 7 esti- 
mates, which would allow us to detect lower /nl- 

The ILC exhibits some divergence from the Gaussian 
simulations at about the 10% level (see above). There are 
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